CHAPTER 20 Getting the Hint from Epidemiologic Inference 295

covariate, it does not meet the rules, so you leave it out. You keep doing this

until you run out of variables. Although forward stepwise can work if you have

very few variables, most analysts do not use this approach because it has been

shown to be sensitive to the order you choose in which to enter variables.»

» Backward elimination: In this approach, the first model you run contains all

your potential covariates, including all the confounders and the exposure.

Using modeling rules, each time you run the model, you remove or eliminate

the confounder contributing the least to the model. You decide which one

that is based on modeling rules you set (such as which confounder has the

largest p value). Theoretically, after you pare away the confounders that do

not meet the rules, you will have a final model. In practice, this process can

run into problems if you have collinear covariates (see Chapters 17 and 18 for

a discussions of collinearity). Your first model — filled with all your potential

covariates — may error out for this reason, and not converge. Also, it is not

clear whether once you eliminate a covariate you should try it again in the

model. This approach often sounds better on paper than it works in practice.»

» Stepwise selection: This approach combines the best of forward stepwise

and backward elimination. Starting with the same set of candidate covariates,

you choose which covariate to introduce first into a model with the exposure.

If this covariate meets modeling rules, it is kept, and if not, it is left out. This

continues along as if you are doing forward stepwise — but then, there’s a

twist. After you are done trying each covariate and you have your forward

stepwise model, you go back and try to add back the covariates you left out

one by one. Each time one seems to fit back in, you keep it and consider it part

of the working model. It is during this phase that collinearity between covariates

can become very apparent. After you try back the covariates you originally left

out and are satisfied that you were able to add back the ones that fit the

modeling rules, you can declare that you have arrived at the final model.

Once you produce your final model, check the p value for the covariate or covari-

ates representing your exposure. If they are not statistically significant, it means

that your hypothesis was incorrect, and after controlling for confounding, your

exposure was not statistically significantly associated with the outcome. However,

if the p value is statistically significant, then you would move on to interpret the

results for your exposure covariates from your regression model. After controlling

for confounding, your exposure was statistically significantly associated with

your outcome. Yay!

Use a spreadsheet to keep track of each model you run and a summary of the

results. Save this in addition to your computer code for running the models. It can

help you communicate with others about why certain covariates were retained and

not retained in your final model.